Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing parts of a tax code ID*with asterisks, dots, and an hyphen

    Hi all. I have a dataset with information from candidates that ran for the 2018 general election in Brazil. Each candidate has a tax code ID number called “CPF”. The CPF has 11 numbers and they all follow this format:

    123.456.789-10

    All CPF numbers have two dots and one hyphen separating the last two digits, as shows above.

    I have another dataset that contains information for some of the candidates in the first dataset. It also contains CPF numbers. However, these CPFs omit some digits with asterisks (for data confidentiality purposes):

    E.g.: ***.438.750-**

    What I have in my dataset: a string variable nr_cpf_candidato with eleven numbers without the dots and the hyphen.

    What I need: to have all these CPFs reorganized to follow the data confidentiality format (as above), which will allow me to merge data from both datasets using the reorganized CPF.

    For instance: 12345678910 in the original dataset needs to be converted to ***.456.789-**

    Question: How can I do that?

    Part of the data follows below. Thank you.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str61 nm_candidato str11 nr_cpf_candidato
    "AMANDA BARBOSA DA SILVA"              "10153045400"
    "CARLOS ROBERTO DE ALMEIDA"            "53154665749"
    "ROBERTO CAUNETO PICORELI"             "02128498910"
    "DENISE RODRIGUES MATOS"               "03861833760"
    "JANIER MOTA SANTOS PRIMO"             "51587874504"
    "IZAC GONÇALVES DOS SANTOS"           "06950326661"
    "ALCIONY REGIA SOARES SANTOS"          "96699000420"
    "ELZA LUIZ DE QUEIROZ"                 "44615361653"
    "FLÁVIO DA SILVA DAMIANI"             "69136955949"
    "FABIANO VERGINE TEIXEIRA DE SIQUEIRA" "22013906811"
    "FRANCISCA GEANY FELIPE DO NASCIMENTO" "82880891353"
    "ÁLVARO DAMIÃO VIEIRA DA PAZ"        "67336361668"
    "OSVALDO LUIS FERREIRA DE SOUZA"       "83069348491"
    "IVETE DA SILVA"                       "71279563400"
    "DENIS DUCK"                           "02920126830"
    "CLAUDIO FERREIRA SILVA"               "13471410813"
    "MANOEL LEOCADIO DE MENEZES"           "31471382249"
    "BRUNO ALBUQUERQUE TOLEDO"             "01091273405"
    "JOCIANA MARIA DE SOUSA"               "84000058304"
    "EVANDRO NEVIO ARGENTON"               "49384104949"
    "LUCIMAURO ANTONIO ALVES OLIVEIRA"     "62118706472"
    "FABIO LISANDRO DE LIMA BARROS"        "48226645468"
    "VERA BISPO DOS SANTOS"                "91083451120"
    "CLAUDINETE SENA CONCEIÇÃO"          "58326880906"
    "SEBASTIÃO DA COSTA CANDIDO"          "03539167730"
    end

  • #2
    Code:
    assert length(strtrim(nr_cpf_candidato))==11
    gen wanted= "***."+ substr(strtrim(nr_cpf_candidato), 4, 3)+ "."+substr(strtrim(nr_cpf_candidato), 7, 3)+"-**"
    Res.:

    Code:
    . l, sep(0)
    
         +---------------------------------------------------------------------+
         |                         nm_candidato   nr_cpf_ca~o           wanted |
         |---------------------------------------------------------------------|
      1. |              AMANDA BARBOSA DA SILVA   10153045400   ***.530.454-** |
      2. |            CARLOS ROBERTO DE ALMEIDA   53154665749   ***.546.657-** |
      3. |             ROBERTO CAUNETO PICORELI   02128498910   ***.284.989-** |
      4. |               DENISE RODRIGUES MATOS   03861833760   ***.618.337-** |
      5. |             JANIER MOTA SANTOS PRIMO   51587874504   ***.878.745-** |
      6. |            IZAC GONÇALVES DOS SANTOS   06950326661   ***.503.266-** |
      7. |          ALCIONY REGIA SOARES SANTOS   96699000420   ***.990.004-** |
      8. |                 ELZA LUIZ DE QUEIROZ   44615361653   ***.153.616-** |
      9. |              FLÁVIO DA SILVA DAMIANI   69136955949   ***.369.559-** |
     10. | FABIANO VERGINE TEIXEIRA DE SIQUEIRA   22013906811   ***.139.068-** |
     11. | FRANCISCA GEANY FELIPE DO NASCIMENTO   82880891353   ***.808.913-** |
     12. |          ÁLVARO DAMIÃO VIEIRA DA PAZ   67336361668   ***.363.616-** |
     13. |       OSVALDO LUIS FERREIRA DE SOUZA   83069348491   ***.693.484-** |
     14. |                       IVETE DA SILVA   71279563400   ***.795.634-** |
     15. |                           DENIS DUCK   02920126830   ***.201.268-** |
     16. |               CLAUDIO FERREIRA SILVA   13471410813   ***.714.108-** |
     17. |           MANOEL LEOCADIO DE MENEZES   31471382249   ***.713.822-** |
     18. |             BRUNO ALBUQUERQUE TOLEDO   01091273405   ***.912.734-** |
     19. |               JOCIANA MARIA DE SOUSA   84000058304   ***.000.583-** |
     20. |               EVANDRO NEVIO ARGENTON   49384104949   ***.841.049-** |
     21. |     LUCIMAURO ANTONIO ALVES OLIVEIRA   62118706472   ***.187.064-** |
     22. |        FABIO LISANDRO DE LIMA BARROS   48226645468   ***.266.454-** |
     23. |                VERA BISPO DOS SANTOS   91083451120   ***.834.511-** |
     24. |            CLAUDINETE SENA CONCEIÇÃO   58326880906   ***.268.809-** |
     25. |           SEBASTIÃO DA COSTA CANDIDO   03539167730   ***.391.677-** |
         +---------------------------------------------------------------------+

    Comment


    • #3
      Thank you very much Andrew. It worked perfectly. I will read more about strtrim.

      Comment

      Working...
      X